-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fallback to use the CUDA primary context #189
Fallback to use the CUDA primary context #189
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we work through the context-getting logic, I am not sure it is right?
cpp/include/kvikio/utils.hpp
Outdated
if (err != CUDA_ERROR_INVALID_VALUE) { | ||
// If this isn't the case, we grab the current context | ||
CUDA_DRIVER_TRY(cudaAPI::instance().CtxGetCurrent(&ctx)); | ||
} else { | ||
CUDA_DRIVER_TRY(err); // Check for errors from previous asynchronous launches | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic is wrong? Let's try and work through it:
The options for the allocation are:
- stream ordered:
CU_POINTER_ATTRIBUTE_CONTEXT
will returnCUDA_SUCCESS
and putctx = nullptr
- not stream ordered:
CU_POINTER_ATTRIBUTE_CONTEXT
will returnCUDA_SUCCESS
and putctx = some_context
- not a UVM allocation:
CU_POINTER_ATTRIBUTE_CONTEXT
will returnCUDA_ERROR_INVALID_VALUE
- Some other thing: some other cuda error
So if we hit case 3 or 4, we want to error, if we hit case 2, we want to return the context that the query returned, if we hit case 1 we need to do some more work: I think we can just use CU_POINTER_ATTRIBUTE_DEVICE_POINTER
query and then do something based on that (either return a non-null current context or else build a primary context and return that).
So perhaps:
CUcontext ctx;
// either the pointer is valid (CUDA_SUCCESS), or it isn't (some other error)
CUDA_DRIVER_TRY(cudaAPI::instance().PointerGetAttribute(&ctx, CU_POINTER_ATTRIBUTE_CONTEXT, dev_ptr));
if (ctx != nullptr) {
// not a stream-ordered allocation, this context is fine.
return ctx;
}
// stream-ordered allocation, let's see if the current device context exists and we can use it
CUDA_DRIVER_TRY(cudaAPI::instance().CtxGetCurrent(&ctx));
if (ctx != nullptr) {
CUdeviceptr target{};
auto const err = cudaAPI::instance().PointerGetAttribute(&target, CU_POINTER_ATTRIBUTE_DEVICE_POINTER, dev_ptr);
if (err == CUDA_SUCCESS && target == dev_ptr) return ctx;
}
// Either no context, or the current context can't access the device pointer?
int ord = get_device_ordinal_from_pointer(devPtr);
_primary_contexts.try_emplate(ord, ord);
return _primary_contexts.at(ord).ctx;
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I messed up the equality checks and we don’t need to check the accessibility of devPtr
when CU_POINTER_ATTRIBUTE_CONTEXT
succeed. However, we must survive a CUDA_ERROR_INVALID_VALUE
error.
Co-authored-by: Lawrence Mitchell <[email protected]>
…nto use_cuda_primary_context
cpp/include/kvikio/utils.hpp
Outdated
CUcontext ctx; | ||
const CUresult err = | ||
cudaAPI::instance().PointerGetAttribute(&ctx, CU_POINTER_ATTRIBUTE_CONTEXT, dev_ptr); | ||
if (err != CUDA_ERROR_INVALID_VALUE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we fall through if the result is CUDA_ERROR_INVALID_VALUE
? Ah, I guess if we don't have UVM, this might still be a device pointer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to crash when getting CUDA_ERROR_INVALID_VALUE
so we cannot use CUDA_DRIVER_TRY()
. Instead, we want to get the current context (if any).
cpp/include/kvikio/utils.hpp
Outdated
CUdeviceptr current_ctx_dev_ptr{}; | ||
CUDA_DRIVER_TRY(cudaAPI::instance().PointerGetAttribute( | ||
¤t_ctx_dev_ptr, CU_POINTER_ATTRIBUTE_DEVICE_POINTER, dev_ptr)); | ||
if (current_ctx_dev_ptr != dev_ptr) { return ctx; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I don't understand this check. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__UNIFIED.html#group__CUDA__UNIFIED_1g0c28ed0aff848042bc0533110e45820c says:
Returns in *data the device pointer value through which ptr may be accessed by kernels running in the current CUcontext. The type of data must be CUdeviceptr *.
If there exists no device pointer value through which kernels running in the current CUcontext may access ptr then CUDA_ERROR_INVALID_VALUE is returned.
If there is no current CUcontext then CUDA_ERROR_INVALID_CONTEXT is returned.
Except in the exceptional disjoint addressing cases discussed below, the value returned in *data will equal the input value ptr.
So my reading is that if the driver call was successful, then current_ctx_dev_ptr
will either be the same as dev_ptr
(usual case) or different (but still valid) in the corner cases where dev_ptr
was allocated either by cuMemHostRegister
or cuMemHostAlloc
with CU_MEMHOSTALLOC_WRITECOMBINED
.
In either case, the current context is good for accessing the pointer.
So I think:
if (current_ctx_dev_ptr != dev_ptr) { return ctx; } | |
if (current_ctx_dev_ptr == dev_ptr) { return ctx; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If dev_ptr != current_ctx_dev_ptr
, we cannot access dev_ptr
using the current context right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it means that the memory was allocated in one of the special ways listed above. The context is capable of accessing the data, but must do so through current_ctx_dev_ptr
(not dev_ptr
). Since this function is only asking "what context is good for accessing this pointer?" I think we should always return the ctx
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still one quibble about the case where we have a context and have queried the device pointer attribute.
Co-authored-by: Lawrence Mitchell <[email protected]>
Marked the PR as a draft again. I will write some tests tomorrow before we merge |
Co-authored-by: Lawrence Mitchell <[email protected]>
cpp/include/kvikio/utils.hpp
Outdated
CUDA_DRIVER_TRY(cudaAPI::instance().CtxGetCurrent(&ctx)); | ||
if (ctx != nullptr) { | ||
CUdeviceptr current_ctx_dev_ptr{}; | ||
CUDA_DRIVER_TRY(cudaAPI::instance().PointerGetAttribute( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have CUDA_ERROR_INVALID_VALUE
here (the current context is not set up for allowing us to access this pointer). Is that a recoverable error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should certainly handle other errors as failure.
cpp/include/kvikio/utils.hpp
Outdated
// Finally, if we didn't find any usable context, we return the primary context of the | ||
// device that owns `devPtr`. Notice, we use `_primary_contexts` to cache the primary | ||
// context of each device. | ||
int ordinal = get_device_ordinal_from_pointer(dev_ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If dev_ptr
was allocated with cuMemHostAlloc
with CU_MEMHOSTALLOC_WRITECOMBINED
, and is the "host address". It looks like a UVM address, but it is unmapped on the device, so cannot be used in a kernel, even with this primary context that we just made. In that case, the pointer that is usable on the device is the one obtained from querying CUDA_POINTER_ATTRIBUTE_DEVICE_POINTER
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After setting the primary context, I think if we go round and query CUDA_POINTER_ATTRIBUTE_DEVICE_POINTER
again and find that the result is different from the input, then we get an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should try and clarify some more with the cuda team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just some minor debugging leftovers to clean up I think
Co-authored-by: Lawrence Mitchell <[email protected]>
Thanks for the help @wence- ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @madsbk! How shall we test that this works for the initial reported cudf bug?
In |
I was going to just build locally and check? |
Ahh sure, I have done that continually :) To built cudf using this branch, add the KvikIO source in the diff --git a/build.sh b/build.sh
index bee66d819b..b6bcf5ce75 100755
--- a/build.sh
+++ b/build.sh
@@ -288,6 +288,7 @@ if buildAll || hasArg libcudf; then
-DDISABLE_DEPRECATION_WARNINGS=${BUILD_DISABLE_DEPRECATION_WARNINGS} \
-DCUDF_USE_PER_THREAD_DEFAULT_STREAM=${BUILD_PER_THREAD_DEFAULT_STREAM} \
-DCMAKE_BUILD_TYPE=${BUILD_TYPE} \
+ -DCPM_KvikIO_SOURCE=/home/mkristensen/repos/kvikio \
${EXTRA_CMAKE_ARGS}
cd ${LIB_BUILD_DIR} |
Can confirm this fixes the reported cudf issue |
/merge |
Fixes rapidsai/cudf#13019